OpenLineage : Add core lineage service scaffolding#4705
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Introduces a new lineage service boundary and supporting models/configuration, gated behind static and realm feature flags, with a placeholder persistence implementation.
Changes:
- Adds core lineage API models (ingest/query requests, graph/node/edge types) and SPI (
LineagePersistence,LineageService). - Implements
DefaultLineageServicewith enablement checks (static config + realm feature + persistence flag) and adds a disabled persistence placeholder. - Adds unit tests for enablement gating and persistence delegation, and introduces a new realm feature flag
ENABLE_LINEAGE.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| runtime/service/src/test/java/org/apache/polaris/service/lineage/DefaultLineageServiceTest.java | Adds unit coverage for lineage enablement checks and delegation to persistence. |
| runtime/service/src/main/java/org/apache/polaris/service/lineage/LineageConfiguration.java | Adds SmallRye config mapping for lineage feature and persistence toggles. |
| runtime/service/src/main/java/org/apache/polaris/service/lineage/DisabledLineagePersistence.java | Adds placeholder LineagePersistence that fails fast when invoked. |
| runtime/service/src/main/java/org/apache/polaris/service/lineage/DefaultLineageService.java | Adds request-scoped lineage service implementation and enablement gates. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageService.java | Defines a core service boundary for lineage operations. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageQueryRequest.java | Adds request model for lineage graph queries. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineagePersistence.java | Adds persistence SPI contract for lineage backends. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageNodeType.java | Adds enum for node kinds in lineage graphs. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageNode.java | Adds node model for lineage graph responses. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageIngestRequest.java | Adds normalized ingest payload model. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageGraph.java | Adds normalized lineage query response model. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageGranularity.java | Adds enum for dataset vs column granularity queries. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageFieldReference.java | Adds model for dataset field references used in column edges. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageFieldMapping.java | Adds mapping model for column-granularity responses. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageEdge.java | Adds dataset-to-dataset edge model. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageDirection.java | Adds enum for query direction. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageDataset.java | Adds dataset identity model for lineage. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageData.java | Adds response metadata wrapper for datasets. |
| polaris-core/src/main/java/org/apache/polaris/core/lineage/LineageColumnEdge.java | Adds column-level edge model. |
| polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java | Adds realm feature flag ENABLE_LINEAGE. |
|
Thanks for this @iting0321! I think we are trying to, however, do too much in this PR itself. Can we remove the Persistence-related models, as we may need a bit more time to close consensus on those bits? I understand that there will be no callers of these models as a result, but we will still need this in case of both persistence and passthrough models regardless. Given that nothing in here would be considered a "public interface" IMO, we should be ok to change it later down the line, if needed. |
I just removed the Persistence-related models (including ingest model). |
| import java.util.OptionalLong; | ||
|
|
||
| /** Dataset metadata returned in a lineage query response. */ | ||
| public record LineageData( |
There was a problem hiding this comment.
These new classes do not appear to be required / used by the old polaris-core code.
In the spirit of modularization, I'd like to propose defining them in a separate gradle module / jar.
Are there any plans for using these new classes in old polaris-core code?
There was a problem hiding this comment.
Thanks for your feedback! I don’t currently see these classes being consumed by existing polaris-core code paths, they appear to be used only by the new lineage service boundary. So I agree they can be moved into a separate lineage module unless we expect core components to depend on them soon.
|
I am not convinced this should merge before we know which implementation contract it is serving. A core lineage query/service model without a working implementation risks freezing API concepts before we have validated the actual persistence/query semantics. In particular, the local-store discussion is still open around reduced graph semantics, “latest” edge replacement, auth-filtered reads, and backend capability boundaries. If this PR is intended only as neutral scaffolding, I think it should be very explicit about what is not being committed here, and avoid introducing model/API shapes that later persistence PRs depend on as settled. Otherwise we should probably review this together with the first implementation that proves the model is sufficient. |
Hi @snazy , thanks for clarifying. Same as #4827 I’ll continue following the dev@ discussion and update the PR based on the resulting consensus. |
| package org.apache.polaris.lineage; | ||
|
|
||
| /** Service boundary for lineage operations used by transport-layer adapters. */ | ||
| public interface LineageService { |
There was a problem hiding this comment.
confused: how is this different from PR4826?
There was a problem hiding this comment.
PR #4826 adds the lineage persistence SPI and related persistence/ingest models, including PolarisLineageHandler, LineageStoreManager,LineageDataset, LineageColumnEdge, LineageEdge,LineageFieldReference, and LineageIngestRequest.
| suites { | ||
| val intTest by registering(JvmTestSuite::class) | ||
| /** Normalized response model for lineage queries. */ | ||
| public record LineageGraph( |
There was a problem hiding this comment.
This class and some others overlap with #4826 ... I'm a bit confused about the intended merge order 😅 What is the PR to be merge first?
If there are dependencies, it would be nice to keep dependent PRs in the "draft" state for the sake of clarity.
There was a problem hiding this comment.
Thanks for your notification! This is the first PR and #4826 is the second. And I will also add the order at the begin of PR description and turn the others into draft.
| "Lineage is disabled: set polaris.lineage.enabled=true to enable it."); | ||
| } | ||
|
|
||
| if (!callContext.getRealmConfig().getConfig(FeatureConfiguration.ENABLE_LINEAGE)) { |
There was a problem hiding this comment.
Why not inject just the RealmConfig (reduced dependency surface)?
| @@ -26,6 +26,7 @@ plugins { | |||
|
|
|||
| dependencies { | |||
| implementation(project(":polaris-core")) | |||
| implementation(project(":polaris-extensions-lineage")) | |||
There was a problem hiding this comment.
Cross-post from #4826 (comment)
I believe runtime/server should have this as a runtimeOnly dep.
The "service" module should not have hard dependencies (tests are ok) on additional API/feature modules. This is to avoid forcing new feature into all downstream projects (which have to depend on runtime/service).
Please noted that this PR is first out of the three and still depends on #4826 and #4827
Description
This PR introduces the core query/response models, a
LineageServiceboundary, runtime service/config wiring, and disabled-by-default feature guards. It does not add persistence, ingest parsing, REST endpoints, RBAC, or forwarding.Changes
polaris-corelineage query/response models for query requests, graph responses, nodes, node data, field mappings, direction,granularity, and node type inpolaris-core/.../lineage/.LineageServicequery boundary inpolaris-core/.../lineage/.DefaultLineageServiceandLineageConfigurationinruntime/service/.../lineage/.ENABLE_LINEAGEfeature flag, defaulting to disabled, inpolaris-core/.../config/FeatureConfiguration.java.polaris.lineage.enabledinsite/content/in-dev/unreleased/configuration/config-sections/.runtime/service/.../DefaultLineageServiceTest.java.Out of Scope
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)